Genes differentially expressed in secretory versus proliferative endometrium

ABSTRACT

The present invention compares expression profiles from matched samples to identify differential gene expression. Samples are matched according to physiological, pharmacological and/or disease state. Comparison of matched samples eliminates gene expression differences that are the result of changes in variables that are not of interest. The gene expression differences that remain can be attributed with a high degree of confidence to the unmatched variation. The gene expression differences thus identified can be used for example to diagnose disease, identify physiological state, design drugs, and monitor therapies.

RELATED APPLICATIONS

This application is related to and claims the priority date of U.S.provisional application entitled Genes Differentially Expressed inSecretory vs Proliferative Endometrium, Ser. No. 60/193,719 filed onMar. 31, 2000, U.S. provisional application entitled Comparison ofMatched Expression Profiles, Ser. No 60/231,367, filed Sep. 8, 2000, andU.S. provisional application entitled Genes Differentially Expressed inSecretory Versus Proliferative Endometrium, Ser. No 60/240,678, filed onOct. 13, 2000, all of which are hereby incorporated by reference intheir entirety for all purposes

BACKGROUND

Many cellular events and processes are characterized by alteredexpression levels of one or more genes. Differences in gene expressioncorrelate with many physiological processes such as cell cycleprogression, cell differentiation and cell death. Changes in geneexpression patterns also correlate with changes in disease orpharmacological state. For example, the lack of sufficient expression offunctional tumor suppressor genes and/or the over expression ofoncogene/protooncogenes could lead to tumorgenesis (Marshall, Cell, 64313-326 (1991), Weinberg, Science, 254 1138-1146 (1991), incorporatedherein by reference for all purposes). Thus, changes in the expressionlevels of particular genes (e g oncogenes or tumor suppressors) serve assignposts for different physiological, pharmacological and diseasestates.

Gene expression profiles produce a snapshot that reflects the biologicalstatus of the sample, but in many circumstances biological status willreflect more than one characteristic of the sample. For example, whencomparing tumor samples from two patients, there will be changes thatcorrelate with differences between the states of the tumors as well aschanges that correlate with the different physiological states of thetwo patients. One aspect of the current invention is directed atidentifying genes that are differentially expressed between twobiological states as being further correlated with disease,physiological or pharmacological state.

SUMMARY OF THE INVENTION

The present invention is a method to analyze samples that differ fromone another in multiple variables in such a way as to account for thevariables and to focus on elements that are under investigation, such asdisease state for example. Comparison of matched samples eliminates geneexpression differences that are the result of changes in variables thatare not of interest. The gene expression differences that remain can beattributed with a high degree of confidence to the unmatched variation.The gene expression differences thus identified can be used, forexample, to diagnose disease, identify physiological state, designdrugs, and monitor therapies.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

This application relies on, and cites the disclosure of other patentapplications and literature references. These documents are herebyincorporated by reference in their entireties for all purposes. Thepractice of the present invention may employ, unless otherwiseindicated, conventional techniques of organic chemistry, polymertechnology, molecular biology (including recombinant techniques), cellbiology, biochemistry, and immunology, which are within the skill of theart. Such conventional techniques include polymer array synthesis,hybridization, ligation, detection of hybridization using a label.Specific illustrations of suitable techniques can be had by reference tothe example hereinbelow. However, other equivalent conventionalprocedures can, of course, also be used. Such conventional techniquescan be found in standard laboratory manuals such as Genome Analysis ALaboratory Manual Series (Vols I-IV), Using Antibodies. A LaboratoryManual, Cells. A Laboratory Manual, PCR Primer A Laboratory Manual, andMolecular Cloning A Laboratory Manual (all from Cold Spring HarborLaboratory Press), all of which are herein incorporated in theirentirety by reference.

This section presents a detailed description of the preferred inventionand its application. This description is by way of several exemplaryillustrations, in increasing detail and specificity, and of the generalmethods of this invention. These examples are non-limiting, and relatedvariants that will be apparent to one of skill in the art are intendedto be encompassed by the appended claims. Following these examples aredescriptions of embodiments of the data gathering steps that accompanythe general methods.

Description of Concepts

Nucleic acids according to the present invention may include any polymeror oligomer of pyrimidine and purine bases, preferably cytosine,thymine, and uracil, and adenine and guanine, respectively. See Albert LLehninger, PRINCIPLES OF BIOCHEMISTRY, at 793-800 (Worth Pub 1982).Indeed, the present invention contemplates any deoxynbonucleotide,ribonucleotide or peptide nucleic acid component, and any chemicalvariants thereof, such as methylated, hydroxymethylated or glucosylatedforms of these bases, and the like. The polymers or oligomers may beheterogeneous or homogeneous in composition, and may be isolated fromnaturally-occurring sources or may be artificially or syntheticallyproduced. In addition, the nucleic acids may be DNA or RNA, or a mixturethereof, and may exist permanently or transitionally in single-strandedor double-stranded form, including homoduplex, heteroduplex, and hybridstates. Oligonucleotide and polynucleotide are included in thisdefinition and relate to two or more nucleic acids in a polynucleotide.

Pepetide. A polymer in which the monomers are alpha amino acids andwhich are joined together through amide bonds, alternatively referred toas a polypeptide. In the context of this specification it should beappreciated that the amino acids may be, for example, the L-opticalisomer or the D-optical isomer. Peptides are often two or more aminoacid monomers long, and often 4 or more amino acids long, often 5 ormore amino acids long, often 10 or more amino acids long, often 15 ormore amino acids long, and often 20 or more amino acid monomers long,for example. Standard abbreviations for amino acids are used (e g, P forproline). These abbreviations are included in Stryer, Biochemistry,Third Ed, 1988, which is incorporated herein by reference for allpurposes.

Array. An array comprises a solid support with peptide or nucleic acidprobes attached to said support. Arrays typically comprise a pluralityof different nucleic acid or peptide probes that are coupled to asurface of a substrate in different, known locations. These arrays, alsodescribed as “microarrays” or colloquially “chips” have been generallydescribed in the art, for example, U.S. Pat. Nos. 5,143,854, 5,445,934,5,744,305, 5,677,195, 6,040,193, 5,424,186 and Fodor et al, Science, 251767-777 (1991). Each of which is incorporated by reference in itsentirety for all purposes. These arrays may generally be produced usingmechanical synthesis methods or light directed synthesis methods whichincorporate a combination of photolithographic methods and solid phasesynthesis methods. Techniques for the synthesis of these arrays usingmechanical synthesis methods are described in, e g, U.S. Pat. No5,384,261, incorporated herein by reference in its entirety for allpurposes. Although a planar array surface is preferred, the array may befabricated on a surface of virtually any shape or even a multiplicity ofsurfaces. Arrays may be peptides or nucleic acids on beads, gels,polymeric surfaces, fibers such as fiber optics, glass or any otherappropriate substrate, see U.S. Pat. Nos. 5,770,358, 5,789,162,5,708,153, 6,040,193 and 5,800,992, which are hereby incorporated intheir entirety for all purposes. Arrays may be packaged in such a manneras to allow for diagnostics or other manipulation of an all inclusivedevice, see for example, U.S. Pat. Nos. 5,856,174 and 5,922,591incorporated in their entirety by reference for all purposes. See alsoU.S. patent application Ser. No 09/545,207, filed Apr. 7, 2000 foradditional information concerning arrays, their manufacture, and theircharacteristics. It is hereby incorporated by reference in its entiretyfor all purposes.

Physiological state or physiological status. According to the presentinvention, a physiological state refers to any normal biological stateof a cell or organism. The parameters that are considered in determiningphysiological state include but are not limited to age, gender, ethnicorigin, and reproductive state, which includes, but is not limited tomenstrual state, post-partum, pregnancy, lactation, and nulliparity. Forthe purposes of this invention the physiological state may be determinedby a single indicator. For example, the age of a patient may be the onlyindicator of physiological state used to categorize a reference sample.Preferably several indicators of physiological state will be used tocategorize a reference sample. Methods to determine the physiologicalstate of a sample include but are not limited to measuring the abundanceand/or activity of cellular constituents (expression profile,genotyping), morphological phenotype, or interview of the subject.

Physiological state can refer to, but is not limited to, thephysiological state of an organism, an organ, a tissue, a collection ofcells or an individual cell. In a preferred embodiment, thephysiological state refers to the physiological state of a wholeorganism. In another embodiment physiological state refers to thephysiological state of a tissue, for example the physiological state ofthe uterine lining.

Disease state or disease status. In addition to a physiological state, asample may or may not be affected with a disease state. According to thepresent invention, a disease state refers to any abnormal biologicalstate of a cell. This includes but is not limited to an interruption,cessation or disorder of body functions, systems or organs. In general,a disease state will be detrimental to a biological system With respectto the present invention, any biological state, such as a premalignancystate, that is associated with a disease or disorder is considered to bea disease state. A pathological state is the equivalent of a diseasestate.

Disease states can be further categorized into different levels ofdisease state. As used in the present invention, the level of a diseaseor disease state is an arbitrary measure reflecting the progression of adisease or disease state. Generally, a disease or disease state willprogress through a plurality of levels or stages, wherein the affects ofthe disease become increasingly severe. The level of a disease state maybe impacted by the physiological state of the sample.

Therapy or therapeutic regimen. In order to alleviate or alter a diseasestate, a therapy or therapeutic regimen is often undertaken. A therapyor therapeutic regimen, as used herein, refers to a course of treatmentintended to reduce or eliminate the affects or symptoms of a disease. Atherapeutic regimen will typically comprise, but is not limited to, aprescribed dosage of one or more drugs or surgery. Therapies, ideally,will be beneficial and reduce the disease state but in many instancesthe effect of a therapy will have non-desirable effects as well. Theeffect of therapy will also be impacted by the physiological state ofthe sample.

Pharmacological state or pharmacological status. Treatment with drugsmay affect the pharmacological state of a sample. The pharmacologicalstate of a sample relates to changes in the biological status followingdrug treatment. Some of the changes following drug treatment or surgerymay be relevant to the disease state. Some may be unrelated-side effectsof the therapy. Some will be specific to physiological state. Indicatorsof pharmacological state include, but are not limited to, duration oftherapy, types and doses of drugs prescribed, degree of compliance witha given course of therapy, and/or unprescribed drugs ingested.

Biological state or biological status. According to the presentinvention, the biological state of a sample refers to the state of acollection of cellular constituents or any other observable phenotype,which is sufficient to characterize the sample for an intended purpose.The biological state reflects the physiological state of a sample, anydisease state that affects the sample and the pharmacological state ifapplicable. Some methods to determine the biological state of a sampleinclude but are not limited to measuring the abundance and/or activityof cellular constituents, characterizing according to morphologicalphenotype or a combination of the above methods.

The biological status of a sample can be measured or observed byinterrogating the abundances and/or activities of a collection ofcellular constituents. In various embodiments, this invention includesmaking such measurements and/or observations on different collections ofcellular constituents.

Expression profile. One measurement of cellular constituents that isparticularly useful in the present invention is the expression profile.As used herein, an “expression profile” comprises measurement of therelative abundance of a plurality of cellular constituents. Suchmeasurements may include, RNA or protein abundances or activity levels.The expression profile can be a measurement for example of thetranscriptional state or the translational state. See U.S. Pat. Nos6,040,138, 5,800,992, 6,020,135, 6,033,860 and U.S. Ser. No. 09/341,302which are hereby incorporated by reference in their entireties.

Transcriptional state. The transcriptional state of a sample includesthe identities and relative abundances of the RNA species, especiallymRNAs present in the sample. Preferably, a substantial fraction of allconstituent RNA species in the sample are measured, but at least, asufficient fraction is measured to characterize the state of the sample.The transcriptional state is the currently preferred aspect of thebiological state measured in this invention. It can be convenientlydetermined by measuring transcript abundances by any of several existinggene expression technologies.

Translational state. Translational state includes the identities andrelative abundances of the constituent protein species in the sample. Asis known to those of skill in the art, the transcriptional state andtranslational state are related.

The gene expression monitoring system, in a preferred embodiment, maycomprise a nucleic acid probe array (such as those described above),membrane blot (such as used in hybridization analysis such as Northern,Southern, dot, and the like), or microwells, sample tubes, gels, beadsor fibers (or any solid support comprising bound nucleic acids). SeeU.S. Pat. Nos 5,770,722, 5,874,219, 5,744,305, 5,677,195 and 5,445,934,which are expressly incorporated herein by reference. See also Examples,infra. The gene expression monitoring system may also comprise nucleicacid probes in solution.

The gene expression monitoring system according to the present inventionmay be used to facilitate a comparative analysis of expression indifferent cells or tissues, different subpopulations of the same cellsor tissues, different physiological states of the same cells or tissue,different developmental stages of the same cells or tissue, or differentcell populations of the same tissue.

Differentially expressed. The term differentially expressed as usedherein means that the measurement of a cellular constituent varies intwo or more samples. The cellular constituent can be either upregulatedin the experimental relative to the reference or downregulated in theexperimental relative to the reference. Differential gene expression canalso be used to distinguish between cell types or nucleic acids. SeeU.S. Pat. No 5,800,992.

General

The comparison of gene expression profiles from an experimental sampleand a reference sample to identify genes that are differentiallyexpressed between two or more different biological states in the samecell type has become a powerful diagnostic and prognostic tool. Genesidentified through this method can be used as markers for the presenceor level of a disease, as prognostic devices to monitor efficacy oftreatment regimens and as targets for drug design.

The availability of comprehensive methods to analyze gene expressionpatterns for a large number of genes simultaneously has led to a floodof reports describing the expression profiles associated with anincreasingly comprehensive set of biological states. The yeast,Saccharomyces cerevisiae, has been the subject of a majority of thesereports because of the availability of the entire genomic sequence, therelatively small size of the yeast genome and the relative ease withwhich different experimental conditions can be tested.

The yeast system has been used to assay changes in gene expressionassociated with a variety of different physiological, developmental,disease and pharmacological conditions. These include nch versus minimalmedia, (see, Wodicka et al, Nat Biotechnol 15 1359-1367 (1997)),progression through the mitotic cell cycle (see, Cho et al, Mol Cell 265-73 (1998)), response to mutation (see Holstege et al, Cell 95 717-728(1998)[AFFY], cellular response to DNA damage (see, Jelinsky andSampson, Proc Natl Acad Sci USA 96 1486-1491 (1999)) and pseudohyphalformation under conditions of nitrogen starvation (see, Madhani et al,Proc Natl Acad Sci USA 96 12530-12535 (1999)) all of which areincorporated herein by reference for all purposes.

In each of these reports the approach has been to compare expressionprofiles from an experimental sample and a reference sample under agiven set of conditions following a change in experimental conditions.Ideally a single variable is changed between the reference andexperimental samples, allowing any observed changes to be attributed tothe single changed variable. When an experimental sample is compared toa reference sample under less controlled circumstances the differentialexpression that is observed can result from either the changedexperimental condition or from another difference between the twosamples.

It is relatively simple to know and control many variables, such asgenotype and environmental conditions such as temperature, aeration,nutrient availability, and stress conditions, when studying a modelorganism such as yeast in a laboratory environment. However, thisapproach decreases in utility as the subject increases in complexity andit becomes increasingly difficult to identify and control all variables.In a particularly preferred embodiment the subject of the presentinvention is human and one skilled in the art will recognize that it isdifficult to Identify and/or control variables when the subject ishuman.

Differential gene expression analysis experiments have been done inhigher eukaryotes but they are typically restricted to those experimentsin which variation between reference and experimental samples can beminimized. Differential gene expression experiments have been done inmice to identify variation resulting from aging and from reduced caloricintake. See, Lee et al, Science 285 1390-1393 (1999) which isincorporated herein by reference for all purposes. In order to limitvariation not attributable to the experimental conditions, all mice usedin the experiments were males of the same strain maintained underidentical housing and feeding conditions.

Gene expression studies have also been done in humans to identifydifferences in gene expression in diseased samples. In studies of humansresearchers have taken several approaches to minimize variabilitybetween experimental and reference samples. Growing cells in culture isone method that researchers have taken to model biological responses ofhuman cells. For example, the changes in expression profiles in culturedhuman fibroblasts in response to human cytomegalovirus infection, havebeen characterized using DNA array technology. See, Zhu et al, Proc NatlAcad Sci USA 95 14470-14475 (1998) which is incorporated herein byreference for all purposes. Because of the differences between in vitrocell culture and samples derived in vivo, this type of in vitroexamination of gene expression is recognized by those of skill in theart to represent a highly useful but potentially distorted andincomplete picture of a normal response.

When expression profile comparisons are done using primary patientmaterial rather than cell culture, extra steps can sometimes be taken tominimize variation resulting from unknown or uncontrolled differencesbetween the experimental and reference samples. Typically reference andexperimental samples are matched by isolating both from the same tissuetype and often from the same patient in a single procedure. For example,genes that were differentially expressed in colon tumors were identifiedby comparing expression data from colon tumors and normal colon. See,Zhang et al., Science 276.1268-1272 which is incorporated herein byreference for all purposes. However, both of these approaches havelimitations. If a patient has a disease that effects the entire organismor an entire organ it will not be possible to generate a normal samplefrom this individual. Even if an apparently normal sample can beobtained from the same individual it is possible that the sample will beaffected by the disease. When a normal sample is from an individualdistinct from the patient there will be differences that are attributedto differences in the physiological, pharmacological or disease statesof the two individuals.

Another approach has been to compare expression profiles from acollection of samples to identify differences in gene expression betweentwo states that consistently correlated with one state or the other inorder to identify genes that could be used to predict the state of anunknown sample as being one of the two states. See Golub et al, Science286 531-537 (1999) which is incorporated herein by reference for allpurposes. One limitation to this approach is that only genes thatconsistently correlate with one state or another are useful. Forexample, genes that are differentially expressed only in a subset ofsamples would not be useful. This subset may represent samples thatshare another aspect of biological state such as a common physiologicalstate.

A preferred aspect of the present invention describes a novel approachto compare expression profiles and to derive useful information aboutgenes that are differentially expressed in response to a change in aspecific variable even when it is not practical or possible to controlchanges in other variables. The current invention facilitates separationof differential gene expression data into physiological, disease and/orpharmacological components.

Changes in a biological system, whether the result of a disease state ornormal physiological variation, will affect many constituents of asample. In particular, as a result of regulatory, homeostatic, and/orcompensatory networks and systems present in biological systems, eventhe direct disruption of only a single constituent can have complicatedand often unpredictable effects on other constituents.

Alteration of the activity or level of a single, hypothetical protein,such as protein P is considered herein as an example. Although theactivity of only protein P is directly disrupted, additional cellularconstituents that are inhibited or stimulated by protein P, or which areelevated or diminished to compensate for the loss of protein P activitywill also be affected. Still other cellular constituents will beaffected by changes in the levels or activity of the second tierconstituents, and so on.

As a further example consider a sample in which the alteration of theactivity of two hypothetical proteins, P1 and P2, has been altered, thealteration of P1 resulting from a disease state and the alteration of P2resulting from a change in physiological state. As in the first example,each alteration will affect a second tier of constituents that will, inturn, affect the levels or activity of a third tier of constituents, andso on. Measurements of the biological state of the sample will detectchanges in effected constituents but will not distinguish those thatresult from the P1 alteration from those that result from the P2alteration. One aspect of the current invention distinguishes betweenchanges in the expression profile that correlate with the change in P1,resulting from the disease state, and changes that correlate with thechange in P2, resulting from the change in physiological state.

Measurement of the transcriptional state of a cell is preferred in thisinvention because it is relatively easy to measure, it is typically moresensitive than other methods such as morphological characterization andcan typically be applied more consistantly than morphologicalcharacterization.

Some disease states can be difficult to identify based on morphologicaldifferences, especially at early levels of the disease state. A geneticmutation may result in a dramatic change in the expression levels of agroup of genes but biological systems can compensate for changes byaltering the expression of other genes. As a result of these internalcompensation responses, many perturbations may have minimal effects onobservable phenotypes of the system but profound effects to thecomposition of cellular constituents.

It will be appreciated by one skilled in the art that samples can bederived from a variety of sources including, but not limited to, singlecells, a collection of cells, tissue, cell culture, urine, blood, orother bodily fluids The tissue or cell source may include a tissuebiopsy sample, a cell sorted population, cell culture, or a single cell.In a preferred embodiment, the tissue source may include brain, liver,heart, kidney, lung, spleen, retina, bone, lymph node, endocrine gland,reproductive organ, blood, nerve, vascular tissue, and olfactoryepithelium. In one embodiment, eukaryotic tissue is preferred, and inanother, mammalian tissue is preferred, and in yet another, human tissueis preferred.

In yet another preferred embodiment, the tissue or cell source may beembryonic or tumongenic. Tumongenic tissue according to the presentinvention may include tissue associated with malignant andpre-neoplastic conditions, not limited to the following acutelymphocytic leukemia, acute myelocytic leukemia, myeloblastic leukemia,promyelocytic leukemia, myelomonocytic leukemia, monocytic leukemia,erythroleukemia, chronic myelocytic (granulocytic) leukemia, chroniclymphocytic leukemia, polycythemia vera, lymphoma, Hodgkin's disease,non-Hodgkin's disease, multiple myeloma, Waldenstrom'smacroglobulinemia, heavy chain disease, solid tumors, endometrialcancer, ovarian cancer, leiomyoma, fibrosarcoma, myxosarcoma,liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma,endothehosarcoma, lymphanglosarcoma, lymphangioendotheliosarcoma,synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma,rhabdomyosarcoma, colon carcinoma, pancreatic cancer, breast cancer,ovarian cancer, prostate cancer, squamous cell carcinoma, basal cellcarcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous glandcarcinoma, papillary carcinoma, papillary adenocarcinomas,cystadenocarcinoma medullary carcinoma, bronchogenic carcinoma, renalcell carcinoma, hepatoma, bile duct carcinoma, chonocarcinoma, seminoma,embryonal carcinoma, Wilms' tumor, cervical cancer, testicular tumor,lung carcinoma, small cell lung carcinoma, bladder carcinoma, epithelialcarcinoma, glioma, astrocytoma, medulloblastoma, cramopharyngioma,ependymoma, pinealoma, hemangioblastoma, acoustic neuroma,oligodendroglioma, menangioma, melanoma, neuroblastoma, andretinoblastoma. See Fishman, et al, Medicine, 2d Ed (J B Lippincott Co.,Philadelphia, Pa. 1985) hereby incorporated by reference in its entiretyfor all purposes.

OVERVIEW OF THE METHODS OF THIS INVENTION

It is well understood by those of skill in the art that variation in theglobal pattern of gene expression underlies much of the phenotypicdiversity among cells. Phenotypic diversity includes both normalvariation associated with a change in physiological state and abnormalvariation associated with a pharmacological or disease state. One aspectof the current invention differentiates between abnormal variationassociated with disease and/or pharmacological state and normalvariation associated with physiological state.

A preferred embodiment of the current invention matches an experimentalsample to one or more reference samples that match the experimentalsample in at least one parameter that is a determinant of physiologicalstatus, pharmacological and/or disease status and compares theexpression profiles of the experimental and reference samples. In aparticularly preferred embodiment, the current invention is used toidentify genes that are differentially expressed between matchedsamples. The embodiments of the present invention are also applicable todiagnosing the disease state of a sample. The embodiments of the presentinvention are also applicable to characterizing and monitoring diseasestates. This includes identifying and monitoring the level of a diseasestate as well as monitoring the effect of therapies on a disease state.The embodiments of the present invention are also applicable toidentifying and monitoring drug responses that are specific to a givenphysiological state. Thus, the present invention is also useful fordesigning drug therapies that are tailored to the physiological state ofa subject. The present invention in one aspect can also be used toidentify the physiological or pharmacological state of a sample.

Genes have been identified whose expression is varied greatly(preferably more than 4, 10, 15, or 20 fold) between differentphysiological states. Differences in physiological states between anexperimental and reference sample make it difficult to distinguishbetween genes that are differentially expressed because of the change inphysiological state and genes that are differentially expressed becauseof another difference between the samples. For example, when comparingan experimental sample to a reference sample to identify genes that aredifferentially expressed in a disease state, it is preferable to matchthe physiological state of the experimental and reference samples sothat any changes in gene expression that are observed can be attributedto the disease state. Similarly, a difference in disease orpharmacological state can obscure differences in physiological state. Inmany embodiments the current invention matches the physiological,pharmacological and/or disease states of reference and experimentalsamples before comparing expression profiles.

Determining the physiological, pharmacological and/or disease state of asample

In one aspect the invention requires the gathering of information aboutthe physiological, pharmacological and/or disease state of a sample. If,for example, the goal is to diagnose disease in an experimental samplefrom a human patient one aspect of the invention is to discoverinformation about the physiological and pharmacological state of thesample. Another aspect of the invention is to match the experimentalsample to reference samples of similar physiological and pharmacologicalstate. This requires knowledge of the physiological and pharmacologicalstate of the reference sample. In this example, another aspect of theinvention that the reference samples are also of known disease status toallow diagnosis of the disease state of the experimental sample.Information about physiological state can be gathered in a variety ofways. If the subject is human, the sex can be obtained for examplethrough an interview, a visual inspection or through karyotyping.

Information about genotypic state can be derived by sequence analysis.There are a variety of methods, such as array based analysis, standardsequencing techniques, and other commercially available methods.

Information about disease state can be also be obtained through avariety of mechanisms such as identification of symptoms ormorphological examination of effected tissue. Determinants of diseasestate include phenotypic symptoms, level of disease, progress oftherapy. It is possible to have more than one disease contributing tothe disease state of the sample.

Information about pharmacological state can similarly be obtainedthrough a variety of mechanisms. In some circumstances a subject can beinterviewed. Under other circumstances it may be necessary to inspectthe medical history of the subject or to assay for evidence of drug usethrough chemical analysis of blood, urine, skin, saliva or hair.

There may be variation in the expression profiles obtained from samplesthat apparently share a common physiological state. In some embodimentsof the invention the best expression profile to use as a referencesample is an average from a plurality of expression profiles of commonphysiological state. A phenotypic disease state may alter physiologicalstate expression profile (women with a history of sexual abuse havedramatically altered levels of certain hormones—this would be a diseasestate that might go clinically undetected).

Matching Reference Sample(s) to Experimental Sample(s)

According to one aspect of the current invention, samples can be matchedby disease state, by physiological state, or by pharmacological state,or any combination of these states. The objective is to minimizedifferences between the experimental and reference samples. In aparticularly preferred embodiment variation between the experimental andreference sample is limited to a single aspect of a disease,physiological or pharmacological state that is being interrogated. Inanother embodiment the invention removes variation due to one or moreindicators of physiological status, pharmacological status or diseasestatus.

In another aspect the invention removes variation due to one or moreindicators of physiological status and one or more indicators ofpharmacological status. In another aspect the invention removesvariation due to one or more indicators of physiological status and oneor more indicators of disease status. In another aspect the inventionremoves variation due to one or more indicators of disease status andone or more indicators of pharmacological status.

In one aspect of the invention the reference sample(s) is selected tomatch the experimental sample in at least one parameter that is adeterminant of physiological state. In this aspect of the invention itis preferable that the reference sample(s) matches the experimentalsample in many parameters that are determinants of physiological state.The reference sample and the experimental sample could be from subjectsthat are similar in age, gender, reproductive status or ethnic on gin,any combination of these aspects or other aspects that are determinantsof physiological state.

In one aspect of the invention the reference sample is selected to matchthe experimental sample in at least one aspect of a disease state Inthis aspect of the invention it is preferable that the referencesample(s) matches the experimental sample in many parameters that aredeterminants of disease state.

In another aspect of the invention the reference sample is selected tomatch the experimental sample in at least one aspect of apharmacological state. In this aspect of the invention it is preferablethat the reference sample(s) matches the experimental sample in manyparameters that are determinants of pharmacological state.

Identifying Differentially Expressed Genes From Matched Samples.

In one embodiment of the invention matched experimental and referencesamples are compared to identify differences. Comparisons that can bemade include, but are not limited to diseased to normal from matchingphysiological state, diseased to diseased from different physiologicalstates, normal to normal from different physiological states, diseasedto diseased from the same physiological state, and normal to normal fromthe same physiological state. A sample of unknown physiological statecan be compared to a plurality of samples of known physiological stateto identify the physiological state of the sample.

In many embodiments of the current invention, expression profiles willbe compared. In a particularly preferred embodiment expression profilesare compared to identify genes that are differentially expressed betweenthe samples. This embodiment of the invention is useful, for example,for identifying genes that are differentially expressed in a diseasedand normal sample or in different levels of disease. Genes that aredifferentially expressed can be used as diagnostic or prognostic markersor drug/therapy targets or indicators of physiological orpharmacological status. They can be used individually or in sets of, forexample, 2, 5, 10, 20, 30, 100, 150, 200, 250, 500, or 1,000 or more.The identified genes can be used to design probes for microarrays.

Diagnosing Disease States

In a particularly preferred embodiment the current invention can be usedto diagnose disease. Reference samples are selected to match theexperimental sample in physiological and/or pharmacological state and torepresent a plurality of different known disease states. The expressionprofile from the experimental sample is then compared to a plurality ofexpression profiles from reference samples to identify one or morereference samples that match the expression profile of the experimentalsample. The experimental sample is diagnosed with the disease of thematching reference sample(s).

In one aspect of the invention the disease states represented areselected from a subset of diseases that match one or more symptoms inthe experimental sample. For example, if the experimental sample is froma 30year-old female patient with difficulty becoming pregnant, samplesfrom 30year-old females diagnosed with specific forms of infertility canbe chosen as reference samples.

Monitoring Disease States

Following the diagnosis of a particular disease in a patient or subjectit is often useful to obtain information about the level of the diseasestate. If diagnosis is followed by therapy it is also often useful toobtain information about the level of the disease state during and aftertherapy.

In one aspect the invention is used to identify or characterize thestage of a tumor. Tumorogenic experimental samples are compared toreference samples that are matched to the experimental sample in one ormore indicators of physiological or pharmacological status. Referencesamples with well characterized tumors are selected. Comparison can beof morphological features or of other biological readouts includingexpression profiles. The present method stages tumors by comparison toreference samples of matching physiological and/or pharmacologicalstate, thus eliminating gene expression differences that result fromdifferences in physiological and/or pharmacological state that may notbe relevant to the disease state.

In one embodiment the present invention can be used for monitoring thedisease state of a subject undergoing one or more therapies. Thisrequires the comparison of a sample before treatment with samplesfollowing treatment. There may be changes to the physiological state ofthe patient that occur over the course of the therapy that are unrelatedto the therapy. When comparing a sample before treatment to a sampleafter treatment it will be preferable to identify changes between thesamples that result from a change in physiological state. In one aspectthe current invention identifies changes that are the result ofphysiological change rather than therapeutic intervention.

Identifying and Monitoring Drug Responses that are Specific toPhysiological State.

The current invention can be used to correlate differences in drugefficacy with differences in physiological state. Some drug therapiesare highly effective in one patient but ineffective or deleterious inanother patient. Differences in drug efficacy may correlate withdifferences in genotypic state, disease state or physiological state.

The current invention can be used to identify changes in gene expressionfollowing drug treatment that are specific to a physiological state.This could facilitate the discovery/design of therapies that arespecific for the physiological state of the patient.

Drug therapies will have different effects depending on thephysiological status of subject. Some drug therapies have different sideeffects in different physiological states. Some drug therapies havedifferent efficacies in men and women, in particular many are lesseffective in women than in men. In a preferred embodiment the currentinvention is used to identify drug effects that are specific to women.In another preferred embodiment the method is used to identify drugeffects that are specific to men.

The invention can also be used to identify therapeutic regimens that areoptimized for the physiological state of the patient. Therapeutictreatments ideally impart maximal disease reduction with minimal adverseside effects, but many therapeutic treatments do have undesirable sideeffects. These side effects may be specific to the physiological stateof the sample. The current invention could be used as a tool to designtherapeutic regimens that are specific for the physiological state ofthe subject.

Identifying the Physiological State of an Experimental Sample

A sample for which relatively little information is known about thesubject from which the sample was supplied could be compared to aplurality of expression profiles of known physiological status in orderto determine the physiological status of the subject.

For example a blood or semen sample isolated from a crime scene could beused to obtain information about the physiological status of thecriminal, such as age and ethnic origin.

Specific Applications

Those skilled in the art will recognize that in a preferred embodiment,the expression profiles from the reference samples will be input to adatabase. A relational database is preferred and can be used, but one ofskill in the art will recognize that other databases could be used. Arelational database is a set of tables containing data fitted intopredefined categories. Each table, or relation, contains one or moredata categories in columns. Each row contains a unique instance of datafor the categories defined by the columns. For example, a typicaldatabase for the invention would include a table that describes a samplewith columns for age, gender, reproductive status, expression profileand so forth. Another table would describe a disease symptoms, level,sample identification, expression profile and so forth. See U.S. Ser. No09/354,935, which is hereby incorporated by reference in its entiretyfor all purposes.

In one embodiment the invention matches the experimental sample to adatabase of reference samples. The database is assembled with aplurality of different samples to be used as reference samples. Anindividual reference sample in one embodiment will be obtained from apatient during a visit to a medical professional. The sample could befor example a tissue, blood, urine, feces or saliva sample. Informationabout the physiological, disease and/or pharmacological status of thesample will also be obtained through any method available. This mayinclude, but is not limited to, expression profile analysis, clinicalanalysis, medical history and/or patient interview. For example, thepatient could be interviewed to determine age, sex, ethnic origin,symptoms or past diagnosis of disease, and the identity of any therapiesthe patient is currently undergoing. A plurality of these referencesamples will be taken. A single individual may contribute a singlereference sample or more than one sample over time. One skilled in theart will recognize that confidence levels in predictions based oncomparison to a database increase as the number of reference samples inthe database increases. One skilled in the art will also recognize thatsome of the indicators of status will be determined by less precisemeans, for example information obtained from a patient interview islimited by the subjective interpretation of the patient. Additionally, apatient may lie about age or lack sufficient information to provideaccurate information about ethnic or other information. Descriptions ofthe seventy of disease symptoms is a particularly subjective andunreliable indicator of disease status.

The database is organized into groups of reference samples. Eachreference sample contains information about physiological,pharmacological and/or disease status. In one aspect the database is arelational database with data organized in three data tables, one wherethe samples are grouped primarily by physiological status, one where thesamples are grouped primarily by disease status and one where thesamples are grouped primarily by pharmacological status. Within eachtable the samples can be further grouped according to the two remainingcategories. For example the physiological status table could be furthercategorized according to disease and pharmacological status.

As will be appreciated by one of skill in the art, the present inventionmay be embodied as a method, data processing system or program products.Examples of computer programs and databases are shown in U.S. Ser. Nos09/354,935, 08/828,952, 09/341,302, 09/397,494, 60/220,587, and60/220,645, which are hereby incorporated by reference in theirentireties for all purposes.

Accordingly, the present invention may take the form of data analysissystems, methods, analysis software and etc. Software written accordingto the present invention is to be stored in some form of computerreadable medium, such as memory, hard-drive, DVD ROM or CD ROM, ortransmitted over a network, and executed by a processor. The presentinvention also provides a computer system for analyzing physiologicalstates, levels of disease states and/or therapeutic efficacy. Thecomputer system comprises a processor, and memory coupled to saidprocessor which encodes one or more programs. The programs encoded inmemory cause the processor to perform the steps of the above methodswherein the expression profiles and information about physiological,pharmacological and disease states are received by the computer systemas input

U.S. Pat. No 5,733,729 illustrates an example of a computer system thatmay be used to execute the software of an embodiment of the invention.This patent shows a computer system that includes a display, screen,cabinet, keyboard, and mouse. The mouse may have one or more buttons forinteracting with a graphic user interface. The cabinet preferably housesa CD-ROM or DVD-ROM drive, system memory and a hard drive which may beutilized to store and retrieve software programs incorporating computercode that implements the invention, data for use with the invention andthe like. Although a CD is shown as an exemplary computer readablemedium, other computer readable storage media including floppy disk,tape, flash memory, system memory, and hard drive may be utilizedAdditionally, a data signal embodied in a carrier wave (e g, in anetwork including the internet) may be the computer readable storagemedium.

The patent also shows a system block diagram of a computer system usedto execute the software of an embodiment of the invention. The computersystem includes monitor, and keyboard, and mouse. The computer systemfurther includes subsystems such as a central processor, system memory,fixed storage (e g, hard drive), removable storage (e g, CD-ROM),display adapter, sound card, speakers, and network interface. Othercomputer systems suitable for use with the invention may includeadditional or fewer subsystems. For example, another computer system mayinclude more than one processor or a cache memory. Computer systemssuitable for use with the invention may also be embedded in ameasurement instrument. The embedded systems may control the operationof, for example, a GeneChip® Probe array scanner as well as executingcomputer codes of the invention.

Computer methods can be used to measure the variables and to matchsamples to eliminate gene expression differences that are a result ofdifferences that are not of interest. For example, a plurality of valuescan be input into computer code for one or more of a physiological,pharmacological or disease states. The computer code can thereaftermeasure the differences or similarities between the values to eliminatechanges not attributable to a value of interest. Examples of computerprograms and databases that can be used for this purpose are shown U.S.Ser. Nos 09/354,935, 08/828,952, 09/341,302, 09/397,494, 60/220,587, and60/220,645, which are hereby incorporated by reference in theirentireties.

In one aspect of the invention, microarrays will be used to measureexpression profiles. Microarrays are particularly well suited because ofthe reproducibility between different experiments DNA microarraysprovide one method for the simultaneous measurement of the expressionlevels of large numbers of genes. Each array consists of a reproduciblepattern of thousands of different DNAs attached to a solid support.Labeled RNA or DNA is hybridized to complementary probes on the arrayand then deteted by laser scanning Hybridization intensities for eachprobe on the array are determined and converted to a quantitativeread-out of relative gene expression levels. The data can be furtheranalyzed to identify expression patterns and variation that correlateswith the biological state of the sample (See U.S. Pat. Nos 6,040,138,5,800,992 and 6,020,135, 6,033,860 and U.S. Ser. No. 09/341,302 whichare incorporated herein by reference).

High-density oligonucleotide arrays are particularly useful formonitoring the gene expression pattern of a sample. In one approach,total mRNA isolated from the sample is converted to labeled cRNA andthen hybridized to an array such as a GeneChip® oligonucleotide array.Each sample is hybridized to a separate array. Relative transcriptlevels are calculated by reference to appropriate controls present onthe array and in the sample. See Mahadevappa, M & Warrington, J. A. Nat.Biotechnol 17, 1134-1136 (1999) which is hereby incorporated byreference in its entirety for all purposes.

Characterization of Biological Status in Females

The current invention is particularly useful when applied to analysis ofexperimental samples from female subjects. Women differ from men in thephysiological indicator of gender, which contributes to an as yetuncharacterized level of differential gene expression. In addition,there is a tremendous amount of normal variation between female subjectsand between different samples from the same female subject. Inparticular, the female reproductive system and the menstrual cycle addan additional level of physiological variation to the analysis ofsamples derived from female subjects. As part of a monthly cycle thelining of the female uterus, the endometrium, undergoes a cycle ofcontrolled tissue remodeling unparalleled in other organs. Thus cycle ispresumably driven by changes in gene expression.

Physiological variation between women and men complicates the design ofeffective therapies for women and the monitoring of therapeutictreatments in women. It is currently well accepted that genderdifferences result in extensive disparity in the ways males and femalesrespond to therapeutic treatments for a variety of non-gender specificdiseases including heart disease and stroke. The reasons for thesedifferences, however, are not well understood, but the menstrual cycleis likely to be at least partially responsible. Much of the researchinto novel drugs and therapeutic treatments is done using male testsubjects. Therefore, there is a great need in the art for methods ofincorporating information about the physiological state of a patientinto the diagnosis and management of diseases.

Gender differences in the efficacy of drug therapy have been appreciatedfor many years, but little has been done to investigate thesedifferences. It is believed that hormonal fluctuations within themenstrual cycle may be a primary cause of gender specific drug response.A systematic investigation of the physiological variation throughout themenstrual cycle, both under normal physiological conditions and inresponse to drug treatment, would be beneficial.

In one embodiment, the current invention correlates information aboutvariation in gene expression with variation in gender. Male and femalesamples that are matched in other indicators of physiological state arecompared to identify genes that are differentially expressed. Forexample a healthy 30-year-old male of similar, i.e., European, descentcould be compared to a healthy 30-year-old female of European descent toidentify genes that are differentially expressed between the twophysiological conditions. In a further embodiment the current inventioncould also be used to monitor changes in pharmacological statusresulting from drug treatments, taking normal physiological variationinto account. For example, the subjects in the first example could becompared again following therapeutic treatment. The genes that wereidentified in the first example would be compared or subtracted from thegenes identified in the second example to identify genes that aredifferentially expressed as a result of the therapy

In another aspect, the current invention diagnoses diseases of thefemale reproductive system. Many disorders of the female reproductivesystem have relatively poor methods of diagnosis and prognosis and manyare typically diagnosed based simply on patient perception, which tendsto be unreliable. For example, pre-menstrual syndrome effects largenumbers of women, but is typically diagnosed only when otherexplanations for the observed symptoms are eliminated. More reliablemethods of diagnosis such as the use of gene expression profiles fordiagnosis and prognosis have been complicated by the changes in geneexpression that accompany the normal physiological variation of thesystem.

Menopause is a woman's final menstrual period, but currently the actualevent can be determined only in retrospect, after she has not had aperiod for 12 continuous months. Menopause can occur naturally any timebetween the mid-30s through the late 50s, but can also be brought onprematurely by events such as gynecological surgery, cancer therapy andcertain illnesses and diseases. The current invention can be used todetermine a molecular profile consistent with a diagnosis of menopausethat would allow earlier diagnosis.

In one embodiment the current invention diagnosis diseases of the femalereproductive organs. An expression profile from an experimental sampleis compared to expression profiles from reference samples that match theexperimental sample in physiological state. The reference samplesrepresent a plurality of different disease states that effect the uterusand the experimental sample is identified as being of the disease stateof the reference sample that is the closest match. The samples can bederived from, for example, endometrial tissue, myometrial tissue, and/oruterine tissue.

In one aspect, a database of reference samples could be comprised ofexpression profiles from endometrial samples and data points identifyingthe physiological, pharmacological and/or disease state of the samples.These reference samples would be from many different individualsrepresenting many different physiological, pharmacological and/ordisease states The reference samples can be derived from for examplenormal tissue at different stages of development and differentiation,tissues affected with a variety of pathological conditions, includingbut not limited to, premenstrual syndrome, PMDD, stress urinaryincontinence, polycystic ovarian disease, endometriosis, endometrialcancer, infertility, hormone imbalance, and tissue subjected to avariety of perturbations including but not limited to hormonereplacement therapy, or chemical contraception. In one preferredembodiment, reference samples will be taken from individuals duringroutine doctor visits. In one embodiment the reference samples wouldrepresent different physiological states of the menstrual cycleincluding but not limited to the secretory and proliferative stages ofthe endometrium.

Providing a Nucleic Acid Sample

One of skill in the art will appreciate that it is desirable to havenucleic samples containing target nucleic acid sequences that reflectthe transcripts of interest. Therefore, suitable nucleic acid samplesmay contain transcripts of interest. Suitable nucleic acid samples,however, may contain nucleic acids derived from the transcripts ofinterest. As used herein, a nucleic acid derived from a transcriptrefers to a nucleic acid for whose synthesis the mRNA transcript or asubsequence thereof has ultimately served as a template. Thus, a cDNAreverse transcribed from a transcript, an RNA transcribed from thatcDNA, a DNA amplified from the cDNA, an RNA transcribed from theamplified DNA, etc., are all derived from the transcript and detectionof such derived products is indicative of the presence and/or abundanceof the original transcript in a sample. Thus, suitable samples include,but are not limited to, transcripts of the gene or genes, cDNA reversetranscribed from the transcript, cRNA transcribed from the cDNA, DNAamplified from the genes, RNA transcribed from amplified DNA, and thelike.

Transcripts, as used herein, may include, but not limited to pre-mRNAnascent transcript(s), transcript processing intermediates, maturemRNA(s) and degradation products. It is not necessary to monitor alltypes of transcripts to practice this invention. For example, one maychoose to practice the invention to measure the mature mRNA levels only.

In one embodiment, such sample is a homogenate of cells or tissues orother biological samples. Preferably, such sample is a total RNApreparation of a biological sample. More preferably in some embodiments,such a nucleic acid sample is the total mRNA isolated from a biologicalsample. Those of skill in the art will appreciate that the total mRNAprepared with most methods includes not only the mature mRNA, but alsothe RNA processing intermediates and nascent pre-mRNA transcripts. Forexample, total mRNA purified with poly (T) column contains RNA moleculeswith poly (A) tails. Those poly A+RNA molecules could be mature mRNA,RNA processing intermediates, nascent transcripts or degradationintermediates.

Biological samples may be of any biological tissue or fluid or cells.Frequently the sample will be a “clinical sample” which is a samplederived from a patient. Clinical samples provide rich sources ofinformation regarding the various states of genetic network or geneexpression. Some embodiments of the invention are employed to detectmutations and to identify the function of mutations. Such embodimentshave extensive applications in clinical diagnostics and clinicalstudies. Typical clinical samples include, but are not limited to,sputum, blood, blood cells (e g, white cells), tissue or fine needlebiopsy samples, urine, pentoneal fluid, and pleural fluid, or cellstherefrom. Biological samples may also include sections of tissues suchas frozen sections taken for histological purposes.

Another typical source of biological samples are cell cultures wheregene expression states can be manipulated to explore the relationshipamong genes. In one aspect of the invention, methods are provided togenerate biological samples reflecting a wide variety of states of thegenetic network.

One of skill in the art would appreciate that it is desirable to inhibitor destroy RNase present in homogenates before homogenates can be usedfor hybridization. Methods of inhibiting or destroying nucleases arewell known in the art. In some preferred embodiments, cells or tissuesare homogenized in the presence of chaotropic agents to inhibitnuclease. In some other embodiments, RNases are inhibited or destroyedby heat treatment followed by proteinase treatment.

Methods of isolating total mRNA are also well known to those of skill inthe art. For example, methods of isolation and purification of nucleicacids are described in detail in Chapter 3 of Laboratory Techniques inBiochemistry and Molecular Biology Hybridization With Nucleic AcidProbes, Part I Theory and Nucleic Acid Preparation, P Tijssen, edElsevier, N.Y. (1993) and Chapter 3 of Laboratory Techniques inBiochemistry and Molecular Biology Hybridization With Nucleic AcidProbes, Part I Theory and Nucleic Acid Preparation, P. Tijssen, edElsevier, N.Y. (1993))

In a preferred embodiment, the total RNA is isolated from a given sampleusing, for example, an acid guanidinium-phenol-chloroform extractionmethod and polyA⁺ mRNA is isolated by oligo dT column chromatography orby using (dT)n magnetic beads (see, e g, Sambrook et al, MolecularCloning A Laboratory Manual (2nd ed), Vols 1-3, Cold Spring HarborLaboratory, (1989), or Current Protocols in Molecular Biology, F Ausubelet al, ed Greene Publishing and Wiley-Interscience, New York (1987)) Seealso PCT/US99/25200 for complexity management and other samplepreparation techniques, which is hereby incorporated by reference in itsentirety.

Frequently, it is desirable to amplify the nucleic acid sample prior tohybridization. One of skill in the art will appreciate that whateveramplification method is used, if a quantitative result is desired, caremust be taken to use a method that maintains or controls for therelative frequencies of the amplified nucleic acids to achievequantitative amplification.

Methods of “quantitative” amplification are well known to those of skillin the art. For example, quantitative PCR involves simultaneouslyco-amplifying a known quantity of a control sequence using the sameprimers. This provides an internal standard that may be used tocalibrate the PCR reaction. The high density array may then includeprobes specific to the internal standard for quantification of theamplified nucleic acid.

Other suitable amplification methods include, but are not limited topolymerase chain reaction (PCR) (Innis, et al , PCR Protocols A guide toMethods and Application Academic Press, Inc. San Diego, (1990)), ligasechain reaction (LCR) (see Wu and Wallace, Genomics, 4 560 (1989),Landegren, et al, Science, 241 1077 (1988) and Barringer, et al, Gene,89 117 (1990), transcription amplification (Kwoh, et al, Proc Natl AcadSci USA, 86 1173 (1989)), and self-sustained sequence replication(Guatelli, et al, Proc Nat Acad Sci USA, 87 1874 (1990)).

Cell lysates or tissue homogenates often contain a number of inhibitorsof polymerase activity. Therefore, RT-PCR typically incorporatespreliminary steps to isolate total RNA or mRNA for subsequent use as anamplification template. One tube mRNA capture methods may be used toprepare poly(A)+ RNA samples suitable for immediate RT-PCR in the sametube (Boehringer Mannheim). The captured mRNA can be directly subjectedto RT-PCR by adding a reverse transcription mix and, subsequently, a PCRmix. In a particularly preferred embodiment, the sample mRNA is reversetranscribed with a reverse transcriptase and a primer consisting ofoligo dT and a sequence encoding the phage T7 promoter to provide singlestranded DNA template. The second DNA strand is polymerized using a DNApolymerase. After synthesis of double-stranded cDNA, T7 RNA polymeraseis added and RNA is transcribed from the cDNA template. Successiverounds of transcription from each single cDNA template result inamplified RNA. Methods of in vitro polymerization are well known tothose of skill in the art (see, e g, Sambrook, supra).

It will be appreciated by one of skill in the art that the directtranscription method described above provides an antisense (aRNA) pool.Where antisense RNA is used as the target nucleic acid, theoligonucleotide probes provided in the array are chosen to becomplementary to subsequences of the antisense nucleic acids.Conversely, where the target nucleic acid pool is a pool of sensenucleic acids, the oligonucleotide probes are selected to becomplementary to subsequences of the sense nucleic acids. Finally, wherethe nucleic acid pool is double stranded, the probes may be of eithersense as the target nucleic acids include both sense and antisensestrands.

The protocols cited above include methods of generating pools of eithersense or antisense nucleic acids. Indeed, one approach can be used togenerate either sense or antisense nucleic acids as desired. Forexample, the cDNA can be directionally cloned into a vector (e g,Stratagene's p Bluescript II KS (+) phagemid) such that it is flanked bythe T3 and T7 promoters. In vitro transcription with the T3 polymerasewill produce RNA of one sense (the sense depending on the orientation ofthe insert), while in vitro transcription with the T7 polymerase willproduce RNA having the opposite sense. Other suitable cloning systemsinclude phage lambda vectors designed for Cre-loxP plasmid subcloning(see e g, Palazzolo et al, Gene, 88-25-36 (1990)).

Other analysis methods that can be used in the present invention includeelectrochemical denaturation of double stranded nucleic acids, U.S. Pat.No. 6,045,996 and 6,033,850, the use of multiple arrays (arrays ofarrays), U.S. Pat. No 5,874,219, the use of scanners to read the arrays,U.S. Pat. Nos. 5,631,734, 5,744,305, 5,981,956 and 6,025,601, methodsfor mixing fluids, U.S. Pat. No 6,050,719, integrated device forreactions, U.S. Pat. No 6,043,080, integrated nucleic acid diagnosticdevice, U.S. Pat. No. 5,922,591, and nucleic acid affinity columns, U.S.Pat. No 6,013,440. All of the above patents are hereby incorporated byreference in their entireties.

Throughout this disclosure, various aspects of this invention arepresented in a range format. It should be understood that thedescription in range format is merely for convenience and brevity andshould not be construed as an inflexible limitation on the scope of theinvention. Accordingly, the description of a range should be consideredto have specifically disclosed all the possible subranges as well asindividual numerical values within that range. For example, descriptionof a range such as from 1 to 6 should be considered to have specificallydisclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from2 to 4, from 2 to 6, from 3 to 6 etc, as well as individual numberswithin that range, for example, 1, 2, 3, 4, 5, and 6. This appliesregardless of the breadth of the range.

All publications and patent applications cited above are incorporated byreference in their entirety for all purposes to the same extent as ifeach individual publication or patent application were specifically andindividually indicated to be so incorporated by reference. Although thepresent invention has been described in some detail by way ofillustration and example for purposes of clarity and understanding, itwill be apparent that certain changes and modifications may be practicedwithin the scope of the appended claims.

The above disclosure generally describes the present invention A morecomplete understanding can be obtained by reference to the followingspecific examples which are provided herein for purposes of illustrationonly, and are not intended to limit the scope of the invention.

EXAMPLES

The following examples are offered to illustrate, but not to limit thepresent invention.

Example 1

Detection of genes differentially expressed in the secretory andproliferative stage endometrium. This is the first report ofdifferential expression between two different physiological states inhuman subjects. The data obtained from this experiment demonstrates thatthere are differences in gene expression between different physiologicalstates in humans. These differences are large enough to be detected byarrays and the number of genes changed is substantial but manageable,making the information useful for diagnostic and prognosticapplications.

Experiments were designed to identify genes that were differentiallyexpressed in the physiologically distinct secretory and proliferativestages of endometrial tissue. Gene expression levels were quantitativelymeasured in tissue samples using high density oligonucleotide arrayscontaining probes representing approximately 6800 full-length humangenes (commercially available from Affymetrix, Santa Clara). Probearrays (DNA chips) of this type have been shown to behave quantitativelywith high specificity and sensitivity (Lockhart, D J et al, 1996, NatBiotech 14 1657-1680). See also, U.S. Pat. No. 6,040,138 The probesequences were based on information from public sequence databases, sucha GenBank Samples derived from secretory and proliferative stageendometrium were hybridized to the probe arrays and the relativeconcentration of more than 6800 human genes were measuredsimultaneously. The RNAs were classified by relative abundance anddifferentially expressed genes were identified by direct comparison.Using this method one can produce, in a relatively short period of time,a quantitative representation of gene expression for a plurality ofdifferent physiological states for a plurality of different cell ortissue types. TABLE 1 Genes downregulated in proliferative vs secretoryendometrium Avg Diff Probe Set Change Fold Change Entrez DefinitionHG721- −13342 −56 6 Placental Protein 14, Endometrial Alpha 2 HT4828Globulin, Alt Splice 3 HG721- −12704 −45 5 Placental Protein 14,Endometrial Alpha 2 HT4827 Globulin, Alt Splice 2 D00632 −8069 −27 7Human plasma (extracellular) mRNA for glutathione peroxidase, completecds X64177 −11610 −39 3 H sapiens mRNA for metallothionein X04470 −6159−16 8 Human mRNA for antileukoprotease (ALP) from cervix uterus M83667−5851 −14 9 Human NF-IL6-beta protein mRNA, complete cds M13690 −5998−13 8 Human plasma protease (C1) inhibitor mRNA, complete cds Y10032−3126 −13 4 H sapiens mRNA for putative serine/threonine protein kinaseX65965 −1760 −11 1 H sapiens SOD-2 gene for manganese superoxidedismutase M34455 −3551  −8 6 Human interferon-gamma-inducibleindoleamine 2,3-dioxygenase (IDO) mRNA, complete cds U10117 −1315 −10 8Human endothelial-monocyte activating polypeptide II mRNA, complete cdsD15050 −668 −5  Human mRNA for transcription factor AREB6, complete cdsK02765_at −8578  −8 1 Human complement component C3 mRNA, alpha and betasubunits, complete cds M59815 −6015  −7 3 Human complement component C4Agene J02611 −10627  −8 8 Human apolipoprotein D mRNA, complete cdsM60974 −977  −5 2 Human growth arrest and DNA-damage- inducible protein(gadd45) mRNA, complete cds U28368 −1357  −6 7 Human Id-relatedhelix-loop-helix protein Id4 mRNA, complete cds M85276 −10891 −14 3 Homosapiens NKG5 gene, complete cds HG2981- −722  −7 1 Epican, Alt Splice 11 HT3127 X92744 −3311 −10 6 H sapiens mRNA for hBD-1 protein X97324−1020  −7 4 H sapiens mRNA for adipophilin /gb = X97324 /ntype = RNAM55153 −2002  −6 5 Human transglutaminase (TGase) mRNA, complete cdsM61916 −1907  −5 8 Human laminin B1 chain mRNA, complete cds M55543 −370*−4 3 Human guanylate binding protein isoform II (GBP-2) mRNA, completecds M97796 −3808  −5 4 Human helix-loop-helix protein (Id-2) mRNA,complete cds M13929 −1042  −5 6 Human c-myc-P64 mRNA, initiating frompromoter P0, (HLmyc2 5) partial cds L00058 −886  −5 5 Human (GH)germline c-myc proto- oncogene, 5′ flank U02556 −3709  −9 2 Human RP3mRNA, complete cds J04080 −8872  −6 2 Human complement component C1rmRNA, complete cds U08989 −984  −6 4 Human glutamate transporter mRNA,complete cds X16396 −615  −5 9 Human mRNA for NAD-dependent methylenetetrahydrofolate dehydrogenase cyclohydrolase (EC 1 5 1 15) M26062 −1378−11 6 Human interleukin 2 receptor beta chain (p70-75) mRNA, completecds U63455 −1401  −6 3 Human sulfonylurea receptor (SUR1) gene X69699−4265  −5 4 H sapiens Pax8 mRNA M24069 −1090  −6 1 Human DNA-bindingprotein A (dbpA) gene, 3′ end M27492 −2004  −4 6 Human interleukin 1receptor mRNA, complete cds M14058 −4723  −5 4 Human complement C1rmRNA, complete cds S37730 −6004  −6 9 insulin-like growth factor bindingprotein- 2 [human, placenta, Genomic, 4575 nt 4 segments] U46499 −1951 −4 5 Human microsomal glutathione transferase (GST12) gene, 5′ sequenceM21574 −2474 −6  Human platelet-derived growth factor receptor alpha(PDGFRA) mRNA, complete cds U65093 −1037  −6 9 Human msg1-related gene 1(mrg1) mRNA, complete cds X76717 −5904  −8 5 H sapiens MT-11 mRNA U33147−976  −7 5 Human mammaglobin mRNA, complete cds U09284 −1158  −4 5 HumanPINCH protein mRNA, complete cds M94856 −1105  −4 9 Human fatty acidbinding protein homologue (PA-FABP) mRNA, complete cds X65614 −5954 −289 H sapiens mRNA for calcium-binding protein S100P Z68228 −2909 −5  Hsapiens mRNA for plakoglobin D87953 −2386  −4 4 Human mRNA for RTP,complete cds K02574 −1199 −4  Human purine nucleoside phosphorylase(PNP) mRNA, complete cds X05908 −744  −4 4 Human mRNA for lipocortinU21936 −1382 *−13 4  Human peptide transporter (HPEPT1) mRNA, completecds J04102 −689 *−7 2 Human erythroblastosis virus oncogene homolog 2(ets-2) mRNA, complete cds M62486 −1190 *−11 7  Human C4b-bindingprotein gene Z19002 −1444 *−14 0  H sapiens of PLZF gene encodingkruppel-like zinc finger protein X57348 −1411 *−13 7  H sapiens mRNA(clone 9112) M15958 −1554 *−15 0  Human gastrin gene, complete cdsM13955 −363 *−4 3 Human mesothelial keratin K7 (type II) mRNA, 3′ endU51010 −1007 *−10 1  Human nicotinamide N-methyltransferase gene, exon 1and 5′ flanking region /gb = U51010 /ntype = DNA /annot = exon M92357−1377 −11 6 Homo sapiens B94 protein mRNA, complete cds M38591 −1176*−11 6  Homo sapiens cellular ligand of annexin II (p11) mRNA, completecds U20758 −1314 *−12 8  Human osteopontin gene, complete cds M13699−2125 *−20 1  Human ceruloplasmin (ferroxidase) mRNA, complete cdsJ05068 −1459 *−14 1  human transcobalamin I mRNA, complete cds L32137−1246 *−12 2  Human germline oligomeric matrix protein (COMP) mRNA,complete cds U08021 −2064 *−19 6  Human nicotinamide N-methyltransferase(NNMT) mRNA, complete cds U07919 −3912 *−36 2  Human aldehydedehydrogenase 6 mRNA, complete cds M84526 −3742 *−34 7  Humanadipsin/complement factor D mRNA, complete cds X96719 −388 *−4 5 Hsapiens mRNA for AICL (activation- induced C-type lectin) L09235 −895*−9 0 Human vacuolar ATPase (isoform VA68) mRNA, complete cds U14528−499 *−5 5 Human sulfate transporter (DTD) mRNA, complete cds HG4321-−336 *−4 0 Ahnak-Related Sequence HT4591 X95240 −591 *−6 3 H sapiensmRNA for cysteine-rich secretory protein-3 X58079 −388 *−4 5 Human mRNAfor S100 alpha protein U42031 −696 *−7 3 Human 54 kDa progesteronereceptor- associated immunophilin FKBP54 mRNA, partial cds M31516 −841*−8 6 Human decay-accelerating factor mRNA, complete cds X92814 −669 *−70 H sapiens mRNA for rat HREV107-like protein X87342 −581 *−6 2 Hsapiens mRNA for human giant larvae homolog U60873 −388 *−4 5 Humanclone 137308 mRNA, partial cds U00115 −622 *−6 6 Human zinc-fingerprotein (bcl-6) mRNA, complete cds L11005 −422 *−4 8 Human aldehydeoxidase (hAOX) mRNA, complete cds Z26653 −1126 *−11 1  H sapiens mRNAfor laminin M chain (merosin) D31762 −739 *−7 6 Human mRNA for KIAA0057gene, complete cds U25997 −728 *−7 5 Human stanniocalcin precursor (STC)mRNA, complete cds U17760 −700 *−7 3 Human laminin S B3 chain (LAMB3)gene U26173 −615 *−6 5 Human bZIP protein NF-IL3A (IL3BP1) mRNA,complete cds M57730 −827 *−8 4 Human B61 mRNA, complete cds M22430 −1069*−10 6  Human RASF-A PLA2 mRNA, complete cds V00594 −15429 *−139 7 Human mRNA for metallothionein from cadmium-treated cells*expression level in proliferative was close to 0 Indistinguishable frombackground

TABLE 2 Genes upregulated in proliferative vs secretory Avg Diff FoldAccession number Change Change Entrez Definition M34516 4759 *43 8 Humanomega light chain protein 14 1 (Ig lambda chain related) gene M634388960 *81 6 Human Ig rearranged gamma chain mRNA, V-J-C region andcomplete cds L22524 717  *7 4 Human matrilysin gene X57766 6160 7 Humanstromelysin-3 mRNA M16364 1877 *17.9 Human creatine kinase-B mRNA,complete cds M68516 910  6 7 PCI gene (plasminogen activator inhibitor3) extracted from Human protein C inhibitor gene, complete cds J04970998 *10 0 Human carboxypeptidase M, 3′ end U83411 1586 *15 3 Homosapiens carboxy- peptidase Z precursor, mRNA, complete cds U79299 1359 6Human neuronal olfacto- medin-related ER localized protein mRNA, partialcds L38517 1527 *14 7 Homo sapiens indian hedgehog protein (IHH) mRNA,5′ end M96789 914  *9 2 Homo sapiens connexin 37 (GJA4) mRNA, completecds AFFX- 6662  16 2 HUMRGE/M10098_(—) M_at AFFX- 3192  13 7HUMRGE/M10098_(—) 5_at*expression level in secretory was close to 0 Indistinguishable frombackground

Example 2

Detection of gene expression changes in endometrial cancer(adenocarcinoma and clear cell carcinoma). The goals of the experimentwere an improved understanding of etiology, identification of candidategenes, improved diagnostics, and improved therapeutics. Tissues used inthe study were 4 matched adenocarcinomas, surgically obtained withmatching normal tissue. All patients were of Northern European descent.Total RNA was isolated from surgical samples and hybridized toAffymetrix HuGene FL arrays as in Example 1. TABLE 3 Tissue sampleorigin Patient number Age in years Normal sample Tumor sample 87 57Endometrium Grade III 106 65 Benign tumor Grade III 119 75 EndometriumGrade III 122 75 Endometrium Grade III

TABLE 4 Genes expresses in normal v endometrial tumor normal tumorTranscripts detected 989 835 Unique transcripts 318 164 Shared detected671 671 Shared absent 2736 2736

TABLE 5 Genes differentially expressed in endometrial tumors Present inNormals/ Absent in Tumors Absent in Normals/Present in Tumors KIAA0367KIAA0119 Platelet activating factor acetylhydrolase 1B gamma- subunitUDP-galactose transporter related isozyme High mobility group protein(HMG-1) Lamin B

TABLE 6 Genes differentially expressed by at least 4 fold in endometrialtumors Cyclin-selective ubiquitin Alpha topoisomerase truncated-formcarrier Connexin Carcinoma-associated antigen PKC zeta, thymidine kinase+2 p78 Keratin 7 Thyroid receptor interactor (TRIP7) NGAL Nucleolarprotein p40 Gastrointestinal tumor-associated antigen Sm protein FCyclin A1 Ezrin Placental bikunin Cycstatin B SIX1 Placental protein 15Chaperonin 10 Retinoic acid inducible factor Diazepam binding inhibitorSplicing factor SRp30c Histone H2B 1 MLN62 Nm23 BST-2 Thymosin beta-10Ribosomal protein L3 Nuclear localization sequence receptor PAX8hSRP1alpha Stromelysin 3 Transformation-sensitive protein Tumorrejection antigen gp96 Mitochondrial matrix protein P1 Inositolpolyphosphate 5-phosphatase, HSP 90 beta-glucuronidase DM kinase, Flt4tyrosine kinase, ERK1, Oncoprotein 18 protein serine/threonine kinase,Protein Kinase Ht31, creatine kinase-B KIAA0015, 172, 136, 092, 073,382, 263, PDGF receptor, steroid 084, 239, Unknown - 2 hormone receptorNer-I Glycine-rich RNA binding protein Splicing factor, SF1-Bo isoformHomeotic Protein Pl2 Serum response factor (SRF), ERF-2, guaninenucleotide regulatory factor (LFP40), PSE- binding factor PTF deltasubunit Non-muscle alpha-actinin Collagen VI alpha-2 Cisplatinresistance associated alpha MTG8a protein (hCRA alpha) Serum constituentprotein (MSE55) MADER CDC42 GTPase-activating protein ARP-1 Guaninenucleotide-binding regulatory HOX 5 1 protein (G-y-alpha))Ubiquitin-activating enzyme E1 related p126 (ST5) proteinHNF-3/fork-head homolog-3 HFH-3 Archain PKD1 Bcl-6 Cyclin 1 SATB1 HCG V

Example 3

Matched normal tissue and adenocarcinoma or clear cell carcinomasranging from Grade I to III were collected from more than 10 patients.Total RNA was used as starting material for the preparation offluorescently labeled nucleic acid targets from all of the samples. Thelabeled targets were hybridized to high-density DNA microarrayscontaining probes representing ˜6800 full-length human genes(Affymetrix, Inc., Santa Clara, Calif.). Sample preparation andhybridization was carried out in a manner similar to the examples anddescription above. Differential gene expression patterns in both of thesubtypes of endometrial cancer were identified. TABLE 7 Differentialexpression in endometrial cancers Clear cell Normal Adenocarcinomacarcinoma MIF − + Cyclin A1 − + MRG1 + − HOX1 + − Alpha 2 collagen + +/−type VI Adducin + +/− +/− Cyclin B +/− + PKC zeta +/− + Calponin + −Caldesmon + − Keratin K17 − + ESE-1b − + HMG1 − + LAMB3 +/− + LamininSB3 +/− + Osteopontin +/− + decorin + +/−([−], no expression observed, [+], expression observed, [+/−],expression observed but lower than corresponding matched sample)

Example 4

GeneCluster. The application of self-organizing maps, a type ofmathematical cluster analysis that is particularly well suited forrecognizing and classifying features in complex, multidimensional data.The method has been implemented in a publicly available computerpackage, GENECLUSTER, that performs the analytical calculations andprovides easy data visualization GENECLUSTER was used to organize thegenes into biologically relevant clusters. SOMs have a number offeatures that make them particularly well suited to clustering andanalysis of gene expression patterns. They are ideally suited toexploratory data analysis, allowing one to impose partial structure onthe clusters (in contrast to the rigid structure of hierarchicalclustering, the strong prior hypotheses used in Bayesian clustering, andthe nonstructure of k-means clustering) and facilitating easyvisualization and interpretation. SOMs have good computationalproperties and are easy to implement, reasonably fast, and scalable tolarge data sets. SOMs have been well studied and empirically tested on awide variety of problems.

The microarray data, though voluminous, can be analyzed by patternrecognition (clustering) software to aid in deriving lists of genes thatdistinguish and characterize disease versus normal biopsies, thusshedding light on molecular genetic profiles and ultimately themechanism of the disease under study. Techniques used for clusteringinclude self-organizing maps (SOM), Bayesian, hierarchical, and k-meansSOM was selected for our analysis because of advantages in initialexploration of the data allowing the operator to impose partialstructure on the clusters 9. Other advantages of SOM include goodcomputational properties, computational speed and easy implementation.

GeneCluster Analysis was conducted on endometrial cancer samples. Itreveals that expression patterns delineating normal from tumor tissues.Hierarchical clustering using GenExplore distingushed majority of thenormal and tumor tissues.

CONCLUSION

From the foregoing it can be seen that the advantage of the presentinvention is that samples that differ from one another in multiplevariables can be analyzed in such a way as to account for the variablesand to focus on elements that are under investigation, such as diseasestate for example. Comparison of matched samples eliminates geneexpression differences that are the result of changes in variables thatare not of interest. The gene expression differences that remain can beattributed with a high degree of confidence to the unmatched variation.The gene expression differences thus identified can be used for exampleto diagnose disease, identify physiological state, design drugs, andmonitor therapies.

All publications and patent applications cited above are incorporated byreference in their entirety for all purposes to the same extent as ifeach individual publication or patent application were specifically andindividually indicated to be so incorporated by reference. Although thepresent invention has been described in some detail by way ofillustration and example for purposes of clarity and understanding, itwill be apparent that certain changes and modifications may be practicedwithin the scope of the appended claims.

1-9. (canceled)
 10. A method of diagnosing endometrial adenocarcinoma ina first endometrial sample comprising: measuring the gene expression inthe first endometrial sample of each of the following genes: MIF, cyclinA1, MRG1, HOX1, Alpha 2 collagen type VI, Adducin, Cyclin B and PKCzeta, to obtain a gene expression measurement for each of said genes inthe first endometrial sample; comparing the gene expression measurementof each of said genes in the first endometrial sample to a geneexpression measurement of the same gene in a second endometrial samplefrom normal endometrium; and identifying the first endometrial sample asbeing endometrial adenocarcinoma if MIF, cyclin A1, cyclin B1 and PKCzeta are expressed at a higher level in the first endometrial samplethan in the second endometrial sample and MRG1, HOX1, Alpha 2 collagentype VI and Adducin are expressed at a lower level in the firstendometrial sample than in the second endometrial sample.